Visually Situated Language Comprehension